Celltype annotation with SCSA¶
Single-cell transcriptomics allows the analysis of thousands of cells in a single experiment and the identification of novel cell types, states and dynamics in a variety of tissues and organisms. Standard experimental protocols and analytical workflows have been developed to create single-cell transcriptomic maps from tissues.
This tutorial focuses on how to interpret this data to identify cell types, states, and other biologically relevant patterns with the goal of creating annotated cell maps.
Note
The annotation with SCSA can't be used in rare celltype annotations
Part.1 Data preprocess¶
In this part, we perform preliminary processing of the data, such as normalization and logarithmization, in order to make the data more interpretable
#import package
import anndata
print('anndata(Ver): ',anndata.__version__)
import scanpy as sc
print('scanpy(Ver): ',sc.__version__)
import matplotlib.pyplot as plt
import matplotlib
print('matplotlib(Ver): ',matplotlib.__version__)
import seaborn as sns
print('seaborn(Ver): ',sns.__version__)
import numpy as np
print('numpy(Ver): ',np.__version__)
import pandas as pd
print('pandas(Ver): ',pd.__version__)
import Pyomic
print('Pyomic(Ver): ',Pyomic.__version__)
anndata(Ver): 0.8.0 scanpy(Ver): 1.9.1 matplotlib(Ver): 3.5.3 seaborn(Ver): 0.11.2 numpy(Ver): 1.22.4 pandas(Ver): 1.4.3 Pyomic(Ver): 1.1.0
#param for visualization
sc.settings.verbosity = 3 # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.settings.set_figure_params(dpi=80, facecolor='white')
from matplotlib.colors import LinearSegmentedColormap
sc_color=['#7CBB5F','#368650','#A499CC','#5E4D9A','#78C2ED','#866017', '#9F987F','#E0DFED',
'#EF7B77', '#279AD7','#F0EEF0', '#1F577B', '#A56BA7', '#E0A7C8', '#E069A6', '#941456', '#FCBC10',
'#EAEFC5', '#01A0A7', '#75C8CC', '#F0D7BC', '#D5B26C', '#D5DA48', '#B6B812', '#9DC3C3', '#A89C92', '#FEE00C', '#FEF2A1']
sc_color_cmap = LinearSegmentedColormap.from_list('Custom', sc_color, len(sc_color))
#load data
adata=sc.read_h5ad('sample/rna.h5ad')
adata
AnnData object with n_obs × n_vars = 22679 × 25596
obs: 'Type'
#filter cells and genes
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
filtered out 1234 cells that have less than 200 genes expressed filtered out 4074 genes that are detected in less than 3 cells
#calculate the proportion of mito-genes
adata.var['mt'] = adata.var_names.str.startswith('MT-') # annotate the group of mitochondrial genes as 'mt'
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)
#visualization of mito-genes and total-genes
sc.pl.violin(adata, ['n_genes_by_counts', 'total_counts', 'pct_counts_mt'],
jitter=0.4, multi_panel=True)
In general, low gene counts and high proportions of mitochondrial genes are usually indicative of poor cell quality. However, some cells, including proximal and distal renal tubule cells, are inherently mitochondria-rich
#viaualization of cells with more mito-genes or total-genes
fig = plt.figure(figsize=(8,4))
ax1=plt.subplot(1,2,1)
sc.pl.scatter(adata, x='total_counts', y='pct_counts_mt',ax=ax1,show=False)
ax2=plt.subplot(1,2,2)
sc.pl.scatter(adata, x='total_counts', y='n_genes_by_counts',ax=ax2,show=False)
plt.tight_layout()
Here, we found that the vast majority of cells had mitochondrial gene expression below 20. To ensure the quality of the data, we chose 10% as the filtering condition for our mitochondrial QC, while the total gene expression was below 2000, so we chose 10 vs. 2000 for our filtering parameters
adata = adata[adata.obs.n_genes_by_counts < 2000, :]
adata = adata[adata.obs.pct_counts_mt < 10, :]
adata
View of AnnData object with n_obs × n_vars = 16977 × 21522
obs: 'Type', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt'
var: 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts'
#normalization, the max counts of total_counts is 20000 means the amount is 10e4
sc.pp.normalize_total(adata, target_sum=1e4)
/Users/fernandozeng/miniforge3/envs/django/lib/python3.8/site-packages/scanpy/preprocessing/_normalization.py:170: UserWarning: Received a view of an AnnData. Making a copy. view_to_actual(adata)
normalizing counts per cell
finished (0:00:00)
#log
sc.pp.log1p(adata)
#select high-variable genes
sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)
extracting highly variable genes
finished (0:00:02)
--> added
'highly_variable', boolean vector (adata.var)
'means', float vector (adata.var)
'dispersions', float vector (adata.var)
'dispersions_norm', float vector (adata.var)
#save and filter
adata.raw = adata
adata = adata[:, adata.var.highly_variable]
#regression:we use the proportion of mito-genes as control to revised the other expression of genes
sc.pp.regress_out(adata, ['total_counts', 'pct_counts_mt'])
regressing out ['total_counts', 'pct_counts_mt']
finished (0:00:15)
#scale
sc.pp.scale(adata, max_value=10)
#pca analysis
sc.tl.pca(adata, n_comps=100, svd_solver="auto")
computing PCA
Note that scikit-learn's randomized PCA might not be exactly reproducible across different computational platforms. For exact reproducibility, choose `svd_solver='arpack'.`
on highly variable genes
with n_comps=100
finished (0:00:03)
#cell neighbors graph construct
sc.pp.neighbors(adata, n_neighbors=15, random_state = 112, n_pcs=50)
computing neighbors
using 'X_pca' with n_pcs = 50
/Users/fernandozeng/miniforge3/envs/django/lib/python3.8/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
finished: added to `.uns['neighbors']`
`.obsp['distances']`, distances for each pair of neighbors
`.obsp['connectivities']`, weighted adjacency matrix (0:00:09)
#umap
sc.tl.leiden(adata)
sc.tl.paga(adata)
sc.pl.paga(adata, plot=False) # remove `plot=False` if you want to see the coarse-grained graph
sc.tl.umap(adata, init_pos='paga')
running Leiden clustering
finished: found 19 clusters and added
'leiden', the cluster labels (adata.obs, categorical) (0:00:01)
running PAGA
finished: added
'paga/connectivities', connectivities adjacency (adata.uns)
'paga/connectivities_tree', connectivities subtree (adata.uns) (0:00:00)
--> added 'pos', the PAGA positions (adata.uns['paga'])
computing UMAP
finished: added
'X_umap', UMAP coordinates (adata.obsm) (0:00:06)
Here, our data pre-processing session is complete, and in the following analysis, we will label the breast cancer cell types
Part.2 cell type annotate¶
By reviewing the literature "A single-cell and spatially resolved atlas of human breast cancers", we have gained a preliminary understanding of the cell types in breast cancer cells and have distilled the following dictionary:epithelial cells (EPCAM), proliferating cells (MKI67), T cells (CD3D), myeloid cells (CD68), B cells (MS4A1), plasmablasts (JCHAIN), endothelial cells (PECAM1) and mesenchymal cells (fibroblasts/perivascular-like cells; PDGFRB).
marker_genes_dict = {
'Epitheilial': ['EPCAM'],
'Proliferating': ['MKI67'],
'T-cell': ['CD3D'],
'Myeloid':['CD68'],
'B-cell':['MS4A1'],
'plasmablasts':['JCHAIN'],
'Endothelial':['PECAM1'],
'Mesenchymal':['PDGFRB']
}
sc.pl.dotplot(adata, marker_genes_dict, 'leiden', dendrogram=True)
WARNING: dendrogram data not found (using key=dendrogram_leiden). Running `sc.tl.dendrogram` with default parameters. For fine tuning it is recommended to run `sc.tl.dendrogram` independently.
using 'X_pca' with n_pcs = 100
Storing dendrogram info using `.uns['dendrogram_leiden']`
WARNING: Groups are not reordered because the `groupby` categories and the `var_group_labels` are different.
categories: 0, 1, 2, etc.
var_group_labels: Epitheilial, Proliferating, T-cell, etc.
We found that the vast majority of cells in our data showed T cells as well as B cells, and a small number of plasma cells, as well as 18, 11, and 15 were unknown cells, suggesting to us that the marker genes given in this literature may not fully label our cell types, so here, we took an automated labeling approach to pre-annotate the cells
dat=Pyomic.single.data_preprocess(adata)
ranking genes
finished (0:00:42)
SCSA is a cell type annotation tool for scRNA-seq data. Currently, most methods employ a manual strategy to annotate cell types after clustering single-cell RNA-seq data. Such methods are labor-intensive and rely heavily on the user's expertise, which can lead to inconsistent results. We propose SCSA, an automated tool for annotating cell types from single-cell RNA-seq data, based on a score annotation model that combines the confidence levels of differentially expressed genes and cellular markers in the database. Evaluation of real scRNA-seq datasets shows that SCSA is able to assign cells to the correct type with desired accuracy in a fully automated mode. https://www.frontiersin.org/articles/10.3389/fgene.2020.00490/full
anno=Pyomic.single.cell_annotate(dat,foldchange=1.5,pvalue=0.01,
output='temp/rna_anno.txt',outfmt='txt')
......Loading dataset from temp/whole.db ......Auto annotate cell Version V1.1 [2020/07/03] DB load: 47347 3 3 48257 37440 Namespace(Gensymbol=True, MarkerDB=None, celltype='normal', cluster='all', db='temp/whole.db', fc='/Users/fernandozeng/Library/Jupyter/runtime/kernel-27f433f3-add7-45cc-bde0-eedb07f187e1.json', foldchange=1.5, input='temp/rna.csv', list_tissue=False, noprint=True, norefdb=False, outfmt='txt', output='temp/rna_anno.txt', pvalue=0.01, source='scanpy', species='Human', target='cellmarker', tissue='All', weight=100.0) Version V1.1 [2020/07/03] DB load: 47347 3 3 48257 37440 load markers: 45409 Cluster 0 Gene number: 4
/Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs) /Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs)
Other Gene number: 1246 Cluster 1 Gene number: 152
/Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs) /Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs)
Other Gene number: 1185 Cluster 10 Gene number: 5
/Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs) /Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs)
Other Gene number: 1246 Cluster 11 Gene number: 488
/Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs) /Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs)
Other Gene number: 1015 Cluster 12 Gene number: 494
/Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs) /Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs)
Other Gene number: 909 Cluster 13 Gene number: 45
/Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs) /Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs)
Other Gene number: 1221 Cluster 14 Gene number: 12
/Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs) /Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs)
Other Gene number: 1247 Cluster 15 Gene number: 255
/Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs) /Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs)
Other Gene number: 1137 Cluster 16 Gene number: 5
/Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs) /Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs)
Other Gene number: 1245 Cluster 17 Gene number: 210
/Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs) /Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs)
Other Gene number: 1170 Cluster 18 Gene number: 73
/Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs) /Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs)
Other Gene number: 1219 Cluster 2 Gene number: 9
/Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs) /Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs)
Other Gene number: 1245 Cluster 3 Gene number: 45
/Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs) /Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs)
Other Gene number: 1215 Cluster 4 Gene number: 28
/Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs) /Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs)
Other Gene number: 1234 Cluster 5 Gene number: 24
/Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs) /Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs)
Other Gene number: 1248 Cluster 6 Gene number: 42
/Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs) /Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs)
Other Gene number: 1247 Cluster 7 Gene number: 4
/Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs) /Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs)
Other Gene number: 1245 !WARNING3:Zero marker sets found, type:marker !WARNING3:Change the threshold or tissue name and try again? !WARNING3:EnsemblID or GeneID,try '-E' command? Cluster 8 Gene number: 3
/Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs) /Users/fernandozeng/Desktop/pyomic/Pyomic/Pyomic/single/_SCSA.py:228: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. all_outs = all_outs.append(outs)
Other Gene number: 1248 !WARNING3:Zero marker sets found, type:marker !WARNING3:Change the threshold or tissue name and try again? !WARNING3:EnsemblID or GeneID,try '-E' command? Cluster 9 Gene number: 2 Other Gene number: 1248 #Cluster Type Celltype Score Times ['0', '?', 'Natural killer T (NKT) cell|T cell', '2.447392444959642|2.2461911469116336', 1.089574432845863] ['1', 'Good', 'B cell', 10.954903686447913, 8.677250051939577] ['10', '?', 'Meiotic prophase fetal germ cell|Oocyte', '0.7302967433402214|0.7302967433402214', 1.0] ['11', '?', 'Macrophage|Monocyte', '8.369949073369517|7.370814415247126', 1.1355528170748133] ['12', '?', 'Natural killer T (NKT) cell|Neural progenitor cell', '8.34407144261642|4.555967568124907', 1.8314597981325351] ['13', '?', 'Epithelial cell|Stem cell', '4.37028693226037|3.154595510368045', 1.385371569158954] ['14', 'Good', 'B cell', 4.984011226848826, 6.443265575810934] ['15', '?', 'Myeloid dendritic cell|Natural killer T (NKT) cell', '6.910322051184151|3.7755585461125585', 1.8302780811859611] ['16', 'Good', 'Oogenesis phase fetal germ cell', 3.2270652069044625, 12.357776452304249] ['17', 'Good', 'Plasmacytoid dendritic cell', 9.845745694780025, 3.471435593180278] ['18', '?', 'Mast cell|Mesenchymal stem cell', '6.89321552056096|4.370950038214283', 1.5770520047804364] ['2', '?', 'Natural killer cell|T helper17 (Th17) cell', '4.665951578149348|3.2675084877974796', 1.4279845318150994] ['3', '?', 'Natural killer cell|T cell', '4.271491275129722|3.808532363699675', 1.1215583503615867] ['4', 'Good', 'Regulatory T (Treg) cell', 6.791807050096626, 10.29594628279838] ['5', 'Good', 'B cell', 7.692165224114249, 7.349980191686429] ['6', 'Good', 'B cell', 8.11325164326611, 7.20754805244557] ['7', 'Good', 'Meiotic prophase fetal germ cell', 1.4242345298173182, 5.632591837080429] ['8', 'N', '-', '-', '-'] ['9', 'N', '-', '-', '-']
#print
Pyomic.single.cell_anno_print(anno)
Cluster:0 Cell_type:Natural killer T (NKT) cell|T cell Z-score:2.447|2.246 Nice:Cluster:1 Cell_type:B cell Z-score:10.955 Cluster:2 Cell_type:Natural killer cell|T helper17 (Th17) cell Z-score:4.666|3.268 Cluster:3 Cell_type:Natural killer cell|T cell Z-score:4.271|3.809 Nice:Cluster:4 Cell_type:Regulatory T (Treg) cell Z-score:6.792 Nice:Cluster:5 Cell_type:B cell Z-score:7.692 Nice:Cluster:6 Cell_type:B cell Z-score:8.113 Nice:Cluster:7 Cell_type:Meiotic prophase fetal germ cell Z-score:1.424 Cluster:10 Cell_type:Meiotic prophase fetal germ cell|Oocyte Z-score:0.73|0.73 Cluster:11 Cell_type:Macrophage|Monocyte Z-score:8.37|7.371 Cluster:12 Cell_type:Natural killer T (NKT) cell|Neural progenitor cell Z-score:8.344|4.556 Cluster:13 Cell_type:Epithelial cell|Stem cell Z-score:4.37|3.155 Nice:Cluster:14 Cell_type:B cell Z-score:4.984 Cluster:15 Cell_type:Myeloid dendritic cell|Natural killer T (NKT) cell Z-score:6.91|3.776 Nice:Cluster:16 Cell_type:Oogenesis phase fetal germ cell Z-score:3.227 Nice:Cluster:17 Cell_type:Plasmacytoid dendritic cell Z-score:9.846 Cluster:18 Cell_type:Mast cell|Mesenchymal stem cell Z-score:6.893|4.371
Nice means that usually the first cell type is the annotation success
Now that the results have annotated the cell types with leiden of 1, 14, 17, 4, 5, 6, 7, then the "?" type, how should we determine it? Here, we choose COSG algorithm to extract the marker gene of each leiden
Pyomic.single.cosg(adata, key_added='cosg', groupby='leiden')
**finished identifying marker genes by COSG**
# create a dictionary to map cluster to annotation label
cluster2annotation = {
'0': 'Germ-cell(Oid)',
'1': 'B-cell',
'2': 'T-cell',
'3': 'NK-cell',
'4': 'T-cell',
'5': 'B-cell',
'6': 'B-cell',
'7': 'Germ-cell(Oid)',
'8': 'Germ-cell(Oid)',
'9': 'Beta-cell(Oid)',
'10': 'Germ-cell(Oid)',
'11': 'Dendritic',
'12': 'T-cell',
'13': 'Acinar-cell',
'14': 'B-cell',
'15': 'Dendritic',
'16': 'Germ-cell(Oid)',
'17': 'Dendritic',
'18': 'Mast-cell',
}
adata.obs['major_celltype'] = adata.obs['leiden'].map(cluster2annotation).astype('category')
Part.3 Cell sub-type annotate¶
In the previous step, we annotated the main cell types, but we noticed that each cell type is composed of different subtypes, and we need to further identify the cell subtypes, so we need to subdivide each cell type into categories according to the previous way
# create a dictionary to map cluster to annotation label
clusterannotation = {
'0': 'BRCA-NK-Germ-cell(oid)',
'1': 'B-cell',
'2': 'Gamma-delta-T-cell',
'3': 'NK-cell',
'4': 'Treg-cell',
'5': 'B-cell',
'6': 'B-cell',
'7': 'BRCA-Meiotic-prophase-fetal-Germ-cell(oid)',
'8': 'BRCA-Meiotic-prophase-fetal-Germ-cell(oid)',
'9': 'BRCA-Beta-cell(oid)',
'10': 'BRCA-Meiotic-prophase-fetal-Germ-cell(oid)',
'11': 'Dendritic-cell',
'12': 'NKT-cell',
'13': 'Acinar-cell',
'14': 'B-cell',
'15': 'Myeloid-dendritic-cell',
'16': 'BRCA-Oogenesis-phase-fetal-Germ-cell(oid)',
'17': 'Plasmacytoid-dendritic-cell',
'18': 'Mast-cell',
}
adata.obs['minor_celltype'] = adata.obs['leiden'].map(clusterannotation).astype('category')
marker_genes_dict1 = {
'Acinar-cell': ['AZGP1'],
'B-cell':['MS4A1'],
'Beta-cell(oid)':['AC245297.3'],
'Meiotic-prophase-fetal-Germ-cell(oid)':['STAG3L3','SPATA22','NUFIP1'],
'NK-Germ-cell(oid)':['RNU4-2','AC005842.1'],
'Oogenesis-phase-fetal-Germ-cell(oid)':['POMZP3'],
'Dendritic-cell':['LYZ','IL1B'],
'Gamma-delta-T-cell':['KLRB1'],
'Mast-cell':['MS4A2'],
'Myeloid-dendritic-cell':['LAMP3','GADD45A'],
'NK-cell':['CCL5'],
'NKT-cell':['MKI67'],
'Plasmacytoid-dendritic-cell':['GZMB'],
'Treg-cell':['CTLA4'],
'Epitheilial': ['EPCAM'],
'Proliferating': ['MKI67'],
'T-cell': ['CD3D'],
'Myeloid':['CD68'],
'B-cell':['MS4A1'],
'plasmablasts':['JCHAIN'],
'Endothelial':['PECAM1'],
'Mesenchymal':['PDGFRB']
}
sc.pl.dotplot(adata, marker_genes_dict1, 'leiden', dendrogram=True)
The tutorial on cell type annotation is roughly over here, but it is quite complicated, so you can experience it yourselves. In addition to the computer side of the package, we also need to analyze and study why there are unexpected cells, and what is the biological significance of these cells, like here, we found a series of class cells, these cells have marker gene characteristics very similar to germ cells, but because they exist in the breast, so they are destined to be somatic cells, so these cells, we call them ** Rare cells**, further study of rare cells can reveal why breast cancer metastasizes, and so on and so forth can be further analyzed, in addition, how to ensure the aesthetics of the figure is also a problem that you need to further consider